Project on Type 2 diabetic and Hyperglycemic Pancreatic Islets

Introduction

Type 2 Diabetes (T2D) is a serious health concern. Identifying and understanding gene expression patterns associated with the disease can help uncover underlying biological mechanisms and could potentially support earlier detection.

Our analysis is focused on:

  • Identifying gene expression markers

  • Identifying key over- and under-expressed genes

  • Looking into co-expression of key genes

Data source:

“A Systems Genetics Approach Identifies Genes and Pathways for Type 2 Diabetes in Human Islets” (PMID: 22768844) (GEO ID: GDS4337)

Materials - Data Set Description

Data set overview:

  • 14481 different genes

  • 63 pancreatic islets samples

    • 9 with T2D

    • 54 controls

Descriptive statistics:

  • Similar mean gene expression across groups

  • Overall low mean expression levels

  • Slightly right-skewed distribution

  • => Testing for significant differential expression between the two groups

Materials - Wrangling

Methods - Median Expression Differences

Goal: Identify genes with largest expression differences between T2D and control

Workflow:

  1. Compute median expression per gene for T2D and Control

  2. Calculate absolute differences between group medians per gene

  3. Rank genes by largest absolute difference

  4. Visualize top genes:

    • Bar plot of top 30 genes (median differences)
    • Boxplots of sample-level expression variation for top 10 genes

Methods - P-values

Goal: Identify genes with largest expression differences between T2D and control

Workflow:

  • Compute Linear Regression Models

    • lm(log_fold_change ~ disease.state)
  • Interpret the Slopes: difference between T2D and control

    • Positive slope –> Higher gene expression

    • Negative slope –> lower gene expression

  • Select significant genes on p-values

Results - Median Expression Differences

Per gene:

  • T2D: 9 samples

  • Control: 54 samples

Results - p.value

  • Final p-value selection was p < 0.01

  • 635 genes labelled significant

  • Top 30 most significant (lowest p-vals) chosen for visualisation

Results - Correlation Matrix

  • 22 strong positive correlations
    (above 0.6)

    • SFRP4 and IL7R
  • 8 strong negative correlations
    (below –0.6)

    • IL1RL1 and RASGRP1

Discussion

Key takeaways

  1. Major differences between our methodologies in terms of results

    • Flipped tendencies in terms of found genes

    • Genes present in each method kept same orientation

  2. Four genes clearly stand out in terms of expression differences

    • IL7R

      • Very over-expressed with some co-expression (one of which is an IL ligand)
    • FGF7

      • Also over-expressed with co-expression with other over-expressed genes
    • GLRA1

      • Under-expressed with negative correlation to all over-expressed genes
    • RASGRP1

      • Under-expressed with negative correlation to all over-expressed genes

Conclusion

  • only small difference between groups observed

  • Unclear how significant

  • Missing broader context.

  • Rudimentary methods applied, more advanced analysis required.